Graph RAG for Criminal Investigations
Combining Knowledge Graphs and NLP to Analyze Instant Messaging Data in Criminal Investigations.
This scientific publication was produced in collaboration with the DatAI Lab at the University of Milano-Bicocca during my visiting period as a Research Student funded by a CINI scholarship.
Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects’ mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.
My contributions to the paper and the associated investigations:
- Ingested and structured a large volume of data extracted from a suspect’s mobile phone into Neo4j.
- Supported the deployment of an entity-recognition and linking pipeline on the ingested data.
- Evaluation of the quality of the entity-recognition and linking pipeline.
- Applied automatic speech-recognition algorithms to convert WhatsApp voice messages into text, enabling downstream entity recognition and linking.
- Developed an interactive terminal that allows prosecutors to navigate the graph without writing Cypher queries.
- Co-authored the manuscript.